Calling for Student Submissions: AI Safety Distillation Contest

a_e_rApr 23, 2022, 8:24 PM

102 points

Research summary Opportunities to take action AI safety Prizes and contests Career choice

At EA UC Berkeley, we’re launching an ongoing series of contests called the Artificial Intelligence Misalignment Solutions (AIMS) series. This second contest, the Distillation Contest, is now open to any student enrolled in a university/college: here are our interest and submission forms! The contest has prizes as large as $2,500 and closes on May 20th. This blog post restates the information that is on our website, with a bit more explanation of the contest’s purpose.

A huge thank you to Akash for creating the infrastructure and support that allow this project to launch!
This competition is for distillations of posts, papers, and research agendas. For short-form arguments for the importance of AI safety, see the AI Safety Arguments Competition.

Purpose

AIMS Series

I think that it is currently difficult for university students to find tangible ways to engage with AI Safety. Generally, by creating a series of AI Safety contests, I hope to:

Help build social capital for students who are interested in Alignment and potentially good at it.
Create ways for people to test their fit for Alignment work.
Create a “brand” around my contests over time so that CS students recognize its name and winners recommend the contests to their friends. Hopefully, this name recognition would also increase the ability to create partnerships with CS orgs as well.

For this specific contest, I’m inspired by the arguments that the field of AI Alignment needs more distillers to improve communication within the field, as well as to make their research accessible to a wider audience. The Distillation Contest aims to produce value by:

Recruiting CS students who have never heard of EA or Alignment before (I will be doing this outreach at UC Berkeley through advertising, but other organizers are welcome to advertise to their own groups for recruitment).
Increasing the engagement of students who are already interested in Alignment.
Potentially producing useful distillations of Alignment research and increasing accessibility to said research.

Contest description:

The Distillation Contest asks that participants:

1) Pick an article/post/research paper on AI Alignment/Safety (ideally from our list below) that would benefit from being more clearly explained.
2) Indicate which ideas or sections of their chosen research should be distilled. Applicants can either distill a whole post/article, a specific part of the post/article, or multiple posts/articles.
3) Create a distillation: a clearer explanation of the research, along with a new example or new application of the research.
4) Optionally: If there is a problem that is trying to be solved by the research you’re distilling, you can attempt to create an additional solution to the problem and include it in your response.

What makes a good distillation?

A good distillation would explain the most confusing part of another piece of writing – the use of distillation is found in creating new ways to understand confusing concepts or confusing technical writing. These distillations would also help readers infer how the distilled ideas relate to other Alignment research. Because of this, creating a good distillation will likely require participants to read related research outside of their distilled post in order to make sure they fully understand the ideas presented in the paper.

As an example of a great distillation, Holden Karnofsky, after creating the Most Important Century Series, created a roadmap to make the series more digestible and navigable. Additionally, Scott Alexander has distilled multiple complex dialogues (and even a meme) in order to make them more accessible.

Posts/articles that we would encourage applicants to choose for the Distillation Contest to distill include the following list. Applicants are allowed to propose their own posts/articles outside of this list, although it’s possible that the judges will not believe that those articles are convoluted enough to need distillation. Therefore, it’s recommended that applicants distill from the list below. (This list may change over time.)

Technical research papers from the Alignment Fundamentals Curriculum. Especially the optional readings
Richard Ngo’s AGI Safety from First Principles sequence
Evan Hubinger’s Risks from Learned Optimization sequence
John Wentworth posts (see the first comment here):
- The Pointers Problem: Human Values are a Function of Human Latent Variables
- Variables Don’t Represent the Physical World
- Selection Theorems
- Debates about how to think about outer alignment and inner alignment (here, here, and here).
Late 2021 MIRI Conversations
What Failure Looks Like
Eliciting Latent Knowledge technical report

Prizes

$2,500 - One prize available for 1st place submission.

$1,250 - One prize available for 2nd place submission.

$500 - Up to 5 prizes available.

$250 - Up to 10 prizes available.

All prize winners’ names will be posted on the EA Berkeley website and selected distillations will be optionally posted to the website.

Scoring

Distillations will be scored on the following factors:

Depth of understanding
Clarity of presentation
Rigor of work
Concision/Length (longer papers will need to present more information than shorter papers)
Originality of insight
Accessibility

Preference may be given to distillations that:

Synthesize multiple sources
Increase the ease of access for the distillation to be an introduction to a topic

Final Notes

There are a few other purposes to this contest that I did not list above but may write about in a future forum post! There are also likely some great articles that should be distilled in addition to the collection of the current list of recommended articles to distill (which were chosen by Akash Wasil). If you have any top recommendations for articles you’d like to be distilled, I may make additions to our existing list so that applicants have a higher chance of distilling that article.

Finally, since the contest is open to all students, please feel free to share our contest information with university students you know! Here is a link to our current advertising material for other organizers to distribute if they’d like.

What links here?

a_e_rApr 23, 2022, 8:24 PM

102 points

28 comments3 min readEA link

Research summary Opportunities to take action AI safety Prizes and contests Career choice

TW123 Apr 23, 2022, 10:13 PM
11 points
0 ∶ 0

Dan Hendrycks and I would love for somebody to distill some of his papers! https://arxiv.org/abs/2008.02275 https://arxiv.org/abs/2110.13136
mariushobbhahn Apr 25, 2022, 9:19 PM
10 points
0 ∶ 0

Hi, are PhD students also allowed to submit? I would like to submit a distillation and would be fine with not receiving any money in case I win a prize. In case this complicates things too much, I could understand if you don’t want that.
- a_e_r Apr 26, 2022, 3:09 AM
  3 points
  0 ∶ 0
  Parent
  
  Hi! I’ve been thinking about this a bit more and I do think I want graduate students to be able to submit! However, since the main audience is meant to be undergraduate students, I may have to be harsher in evaluation or, more excitingly, maybe I could create a new tier for graduate students? For now I’d say feel free to submit and I’ll work out more specifics on my end and make an edit (+ reply to this) if I make official changes!
  - mariushobbhahn Apr 26, 2022, 6:59 AM
    2 points
    0 ∶ 0
    Parent
    
    That sounds very reasonable. Thanks for the swift reply.
ChanaMessinger Apr 26, 2022, 5:40 AM
8 points
0 ∶ 0

This is exciting, I really like this idea, and I’m glad it’s being put into action. Do you know who your judges are? I don’t have any technical knowledge myself, so I’m not speaking from inside view, but one concern I have is whether it might be relatively easy to write distillations that seem good but contain subtle misunderstandings that it would be hard for someone not in the field to catch but matter (certainly in my conversations with friends who know more than me, getting to talk through my amateur understanding of the discord chats has resulted in demonstrating some important ignorances on my part).
- a_e_r Apr 26, 2022, 3:25 PM
  2 points
  0 ∶ 0
  Parent
  
  Great point! Early on, I had someone more connected than me make a list of potential judges. We have 15 names brainstormed and sectioned off by how much they know about alignment. I can say with pretty high certainty that I imagine we will at least have someone whose full-time job is alignment reading the submissions (likely a person with a CS doctorate), but hopefully, we could get even more expertise :)
  - ChanaMessinger Apr 26, 2022, 3:59 PM
    2 points
    0 ∶ 0
    Parent
    
    Awesome! Will you be releasing the list at some point?
    
    Also, if you’re still on the lookout for more judges, I can potentially send people your way! If not, great!
    - a_e_r Apr 26, 2022, 4:36 PM
      1 point
      0 ∶ 0
      Parent
      
      I wasn’t anticipating releasing the list (in some part because people may try to pander to a certain judge’s background and in some part to allow myself and the judges more flexibility in adding people last second).
      
      Sending some judge recommendations my way would be great! I think having a variety of readers would be helpful :) Thank you!
MaxRa Apr 24, 2022, 3:24 PM
5 points
0 ∶ 0

Really cool, just last week I was thinking about whether the alignment community should (massively) scale up prizes with relatively low barriers to entry!
Having you considered making this bigger? E.g. with more prices and a more active outreach to other universities?
- I initially thought that ideally every contribution that clears a certain bar should be rewarded accordingly, that way there’s less uncertainty about payoffs and more people will contribute
- I think you likely could find more texts to recommend, but even duplicated distillations are still valuable for getting students into thinking about alignment research and identifying particularly promising candidates
- Evaluation time is a likely bottleneck, but probably you could find a handful of e.g. AGI Safety Fundamentals alumni to volunteer a few hours, or many more if you offer compensation for helping out
- a_e_r Apr 24, 2022, 9:27 PM
  3 points
  0 ∶ 0
  Parent
  
  Thank you! These are thoughtful comments! I think I will try to add more texts and find more readers, as you suggest.
  I’ve been thinking of going into working on creating contests in the future as a potentially serious work project, so I hope to create some contests that can be larger scale then! Right now, I’m rather limited in capacity. Thankfully, I’m connected with some other great university organizers who I’ve let know about advertising at their schools.
  I think it would be tricky to have clear baseline cutoffs for distillation that still capture quality since writing varies so much between people. Do you have any ideas of clear cutoffs that would retain quality (for future contests if nothing else)?
  - MaxRa Apr 25, 2022, 9:48 AM
    2 points
    0 ∶ 0
    Parent
    
    You probably already have seen that the contest was featured on AstralCodexTen, so you might get more obviously good submissions than you have prices for and it would kinda feel like a wasted opportunity to not clearly signal (i.e. with money) to those authors that their work is highly appreciated and that we would love for them to do more of this work.
  - MaxRa Apr 25, 2022, 9:43 AM
    2 points
    0 ∶ 0
    Parent
    
    I think I will try to add more texts and find more readers, as you suggest.
    I’ve been thinking of going into working on creating contests in the future as a potentially serious work project
    Nice and nice! :)
    Do you have any ideas of clear cutoffs that would retain quality (for future contests if nothing else)?
    Hmm, is your worry that distillations that in hindsight seem to be fairly sub-optimal (e.g. with major mistakes or confusing explanations) end up receiving the lowest tier price because there is some noise introduced by the people who rate the distillations? I think this might happen only rarely, for maybe 2 in 100 distillations? I think your list of scoring criteria already goes a long way giving raters a good idea for what solid work looks like. The money for the lowest tier would also not be a lot, maybe 200$. Giving a price to in-hindsight subpar quality work would maybe reduce the prestige of the price a little bit, but I think it’s a fairly junior price anyway that mostly encourages and rewards initial solid efforts. Also you still would have the higher tiers for especially good work which would lose little prestige.
    - a_e_r Apr 26, 2022, 3:20 PM
      3 points
      0 ∶ 0
      Parent
      
      I do think it’s possible that we might award more prizes retroactively if we recognize that we receive a lot of valuable submissions! Maybe an “honorable mentions” category.
      Ah, I think my worry is that it feels difficult for me to find a standard to rate that actually tracks quality. If I give a couple of examples, people may feel limited to having their work look like those examples. I might say “make your distillation 1,000 words and explain two papers and I’ll give you a prize” but 1,500 words on one paper might have made an optimal submission and I would have limited people’s abilities. I think I find it hard to quantify a bar on writing since everyone has such different approaches. I think the real bar is something more like “the judges who know more about AI Safety than me believe that you have communicated this idea really well” and because of that it feels wrong for me to try to say “and if you do x you will definitely win something.”
      - MaxRa Apr 26, 2022, 4:15 PM
        3 points
        0 ∶ 0
        Parent
        
        Maybe an “honorable mentions” category.
        If they already get a price, I wouldn’t call it “honorable mentions” because that unnecessarily diminishes it in my eyes. Just have anything that seems that would get at B- in school be in the same category as the 250$ price?
        Ah, I think my worry is that it feels difficult for me to find a standard to rate that actually tracks quality.
        Ah, interesting, I have the opposite intuition!:D I completely agree that you shouldn’t give advice about the length of the distillations, but the criteria you mention here just seem really useful and like I’d be surprised if e.g. you find something clearly presented and accessible, and I wouldn’t.
        Depth of understanding
        Clarity of presentation
        Rigor of work
        Concision/Length (longer papers will need to present more information than shorter papers)
        Originality of insight
        Accessibility
        And I feel like somebody who has spend like ~40 hours reading and discussing AI Safety material (e.g. as part AGI Safety Fundamentals course) could do a reasonably coherent job at rating the understanding and rigor. Originality seems maybe the trickiest, as you probably have to have some grasp of what ideas/framings are already in the water and which aren’t.
Miranda_Zhang Apr 23, 2022, 9:48 PM
5 points
0 ∶ 0

Really excited for this—I think distillation will be useful not only for checking the distiller’s understanding, but also in better communicating ideas around AI safety. Thanks for starting up this project!
aog Apr 23, 2022, 9:55 PM
4 points
0 ∶ 0

Hi, would Anthropic’s research agenda be a good candidate for distilling?
badeliz Apr 25, 2022, 6:12 AM
3 points
0 ∶ 0

Would the definition of “student enrolled in a university/college” include master’s students? I would normally think so but the ACX signal boost from today describes this as an “undergraduate” contest, so I wanted to double check.
- a_e_r Apr 25, 2022, 1:56 PM
  2 points
  0 ∶ 0
  Parent
  
  The primary audience for this contests is undergrad, but Master’s students are allowed!
mlc Apr 24, 2022, 2:29 AM
3 points
0 ∶ 0

more distillations, yay! 🥳
the Distillation Contest, is now open to any student enrolled in a university/college
could you organize this to also include people that aren’t enrolled students?
- a_e_r Apr 24, 2022, 6:21 AM
  1 point
  0 ∶ 0
  Parent
  
  Thank you!
  
  Could you clarify what you mean? Do you mean students who are on a break from college, newly admitted students who aren’t yet attending, or something else?
  - mlc Apr 24, 2022, 7:29 PM
    1 point
    0 ∶ 0
    Parent
    
    I am referring to people that chose alternative career paths to AI, autodidacts and independent ML researchers for example.
    - a_e_r Apr 24, 2022, 9:21 PM
      2 points
      0 ∶ 0
      Parent
      
      Unfortunately, I created this contest to help build up university groups, so I think keeping the contest limited to enrolled students (including students who are entering college later this year and students who will graduate before the contest ends) would be the best way to ensure that students feel like they have an advantage in the contest. Thank you for clarifying!
      - mlc Apr 24, 2022, 10:40 PM
        2 points
        0 ∶ 0
        Parent
        
        Thanks for the explanation.
        other thoughts: Abram’s decision theory and Vanessa’s infrabayes work might be good for distillation. Also, might be worth thinking about some type of collab with current distillers, such as Robert Miles or Mark Xu, and the site distill.pub?
        
        a_e_r Apr 25, 2022, 12:25 AM
        1 point
        0 ∶ 0
        Parent
        
        Oh, great idea! If nothing else, distill.pub is a great resource for me to list!
        mlc Apr 29, 2022, 3:03 AM
        4 points
        0 ∶ 0
        Parent
        
        Thanks for pointing that out!
        TW123 Apr 25, 2022, 1:11 PM
        3 points
        0 ∶ 0
        Parent
        
        Distill was never really about distillations in the sense this post is referring to. It was a journal that focused on having very high-quality presentation/visualizations. It’s also no longer active: https://distill.pub/2021/distill-hiatus/
Ishan Mukherjee May 25, 2022, 11:16 PM
1 point
0 ∶ 0

Are multiple submissions allowed?
- a_e_r May 26, 2022, 12:07 AM
  2 points
  0 ∶ 0
  Parent
  
  Sure :)